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Abstract 

This  is  the  final  report  for  the  European  Office  of  Aerospace  Research  and  Development 
contract  FA8655-06- 1-3033,  which  started  in  March,  2006.  Reports  from  a  previous  contract 
(FA8655-03-1-3044)  covered  the  literature  review  and  time  domain  solver,  the  formulation  and 
coding  of  the  bifurcation  solver,  and  results  from  various  methods  for  predicting  wing  rock  on¬ 
set,  including  time  marching,  the  bifurcation  solver,  inverse  power  method  and  reduced  modelling 
through  Proper  Orthogonal  Decomposition.  The  work  in  the  current  period  has  involved  the  devel¬ 
opment  and  demonstration  of  a  parallel  bifurcation  solver  and  the  implementation  and  demonstra¬ 
tion  of  reduced  order  modelling  through  centre  manifold  theory.  Ken  Badcock  presented  a  paper 
on  the  parallel  method  at  the  Aerospace  Sciences  Meeting  in  Reno,  January,  2006.  After  this  a 
meeting  was  held  at  the  Navy  Academy  with  Phil  Beran  and  others  to  discuss  the  exploitation  of 
the  techniques  in  Uncertainty  Studies. 

In  accordance  with  the  terms  of  the  contract,  the  authors  certify  that  there  were  no  subject 
inventions  to  declare  during  the  performance  of  this  grant. 


1  Introduction 

Rolling  motion  about  the  longitudinal  axis  of  a  delta  wing  has  been  computed  using  Compu¬ 
tational  Fluid  Dynamics  (CFD)  by  several  researchers  [1,  2,  3,  4],  Recently  a  comprehensive 
numerical  study  of  wing  rock  was  conducted  by  Saad  [7]  using  a  three  degree-of-freedom  flight 
mechanics  model  for  a  generic  fighter  aircraft  configuration  (forebody,  65°  leading  edge  sweep, 
and  vertical  fin).  Roll,  sideslip  and  vertical  degrees  of  freedom  were  included.  Including  the 
sideslip  degree  of  freedom  was  found  to  delay  the  onset  of  wing  rock  and  reduce  the  wing  rock 
amplitude.  These  problems  provide  a  useful  test  case  for  computing  flight  mechanics  instability 
since  they  feature  constrained  motions  and  nonlinear  aerodynamics. 

The  computational  cost  of  time  domain  CFD  based  predictions  can  be  high.  A  way  of  reducing 
the  cost  of  computing  parametric  searches  for  aeroelastic  stability  was  proposed  by  Morton  and 
Beran[8].  Their  method  uses  dynamical  systems  theory  to  characterise  the  nature  of  the  aeroelas¬ 
tic  instability,  with  this  additional  information  concentrating  the  use  of  the  CFD.  In  this  way  the 
problem  of  locating  a  one  parameter  Hopf  bifurcation  was  reduced  from  multiple  time  marching 
calculations  to  a  single  steady  state  calculation  of  a  modified  (or  augmented)  system.  This  aug¬ 
mented  system,  solved  via  Newton’s  method,  calculates  the  value  of  the  parameter  for  which  an 
eigenvalue  of  the  system  Jacobian  matrix  crosses  the  imaginary  axis. 
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This  approach  was  applied  to  compute  the  wing  rock  onset  angle  [14].  The  formulation  used  a 
sparse  matrix  solver  to  efficiently  solve  the  linear  system  arising  at  each  Newton  step  for  the  aug¬ 
mented  system.  A  matrix  free  product  was  used  to  evaluate  the  Jacobian-vector  product  for  the 
augmented  residual,  resulting  in  convergence  to  the  correct  wing  rock  onset  angle  for  a  spatial  dis¬ 
cretisation  of  second  order  accuracy.  However,  the  Newton  iterations  were  driven  to  convergence 
by  the  Jacobian  matrix  of  the  first  order  scheme,  leading  to  a  loss  of  quadratic  convergence.  Note 
that  the  Jacobian  matrix  we  refer  to  is  of  the  coupled  CFD-rolling  motion  system  of  equations,  and 
so  involves  the  derivatives  of  the  CFD  spatial  discretisation  (here  based  on  Osher’s  approximate 
Riemann  solver). 

The  classical  shifted  inverse  power  method  (IPM)  can  be  applied  to  the  Jacobian  matrix  at  any 
angle  of  attack  to  compute  the  interesting  paid  of  the  eigenspectrum.  This  is  useful  to  derive  a 
starting  solution  for  the  augmented  solver,  and  is  a  fast  method  for  finding  the  onset  angle  in  its 
own  right.  However,  because  only  the  Jacobian  of  the  first  order  spatial  scheme  was  available  in 
the  previous  work  it  was  not  possible  to  exploit  the  full  capability  of  the  IPM. 

Recently  the  analytical  Jacobian  for  the  second  order  spatial  discretisation  was  derived  and 
coded  [15].  This  opened  up  the  possibility  of  applying  the  Inverse  Power  Method  to  calculate 
stability.  This  was  exploited  for  predicting  the  wing  rock  onset  angle. 

This  report  describes  the  final  approach  taken  to  a  parallel  IPM  solver,  focussing  on  the  details 
of  the  sparse  linear  solver.  Results  are  presented  to  show  the  performance  of  the  method  for 
computing  the  wing  rock  onset  angle.  Then,  the  formulation  and  predictions  of  the  reduced  order 
model  based  on  centre  manifold  theory  arc  given. 


2  Formulation 

The  coupled  rolling  motion/Euler  equations  can  be  written  to  emphasise  the  dependencies  as 

•E^a(Wa?  </>,  (/>£,  Of) 

qC;c(wa)  +  D(j)t  . 
fit 

Here  the  vector  Ra  denotes  the  discretisation  of  the  spatial  terms  in  the  Euler  equations,  the  de¬ 
tails  of  which  follow  the  formulation  described  in  reference  [18],  which  can  be  summarised  as 
discretisation  using  Osher’s  method  with  MUSCL  interpolation  on  moving  multiblock  meshes. 
The  stability  of  an  equilibrium  w  =  wo  which  satisfies  R(wo)  =  0  is  determined  by  the  eigen- 
system  of  the  Jacobian  matrix  A  =  cJR/dw.  The  matrix  can  be  written  out  in  block  format 
as 
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The  term  Aan  is  a  large  sparse  matrix  which  represents  the  Jacobian  of  the  discretisation  of  the 
fluid  equations  with  respect  to  the  fluid  variables. 

We  consider  the  stability  problem  when  the  angle  of  attack  a  is  the  parameter.  We  are  inter¬ 
ested  in  finding  the  onset  angle  for  the  wing  rock  and  assume  that  stability  is  lost  through  a  Hopf 
Bifurcation.  In  this  case  the  Jacobian  matrix  A  has  a  pair  of  purely  imaginary  eigenvalues  at  the 
critical  angle. 

The  Power  Method  [19]  is  an  algorithm  for  calculating  the  dominant  eigenvalue/eigenvector 
pair  of  any  given  diagonalizable  matrix  A.  Its  extension  to  the  shifted  inverse  power  method  is 
practical  for  finding  any  eigenvalue  provided  that  a  good  initial  approximation  is  known.  Assume 
that  the  n  x  n  matrix  A  has  distinct  eigenvalues  Ai,  A2, . . . ,  \n  and  consider  the  eigenvalue  A j. 
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Then  a  constant  uo  can  be  chosen  so  that  1  / (A j  —  u)  is  the  dominant  eigenvalue  of  the  matrix 

(. A-uI )-\ 


For  initial  guess  xq  and  constant  c o 
For  k  =  1,2,...  Do: 

zk  =  {A  -  aI)~lXk-i  p) 

Pk  —  1 1  Zk  |  |oo 
xk  Zk/  pk 

EndDo 

The  shifted  inverse  power  method  can  be  used  to  calculate  the  location  of  the  critical  eigen¬ 
value  in  the  complex  plane  at  a  fixed  angle  a.  By  computing  the  location  for  multiple  values  of 
a,  the  angle  at  which  the  eigenvalue  crosses  the  imaginary  axis  can  be  computed.  Variations  on 
this  approach  arc  possible  and  arc  formulated  in  detail  in  reference  [14]. 


3  Linear  Solver  -  Serial 


A  key  paid  of  the  computational  work  in  the  IPM  is  the  solution  of  a  large  sparse  linear  system. 

Eisenstat,  Elman  and  Schultz  [17]  developed  a  generalized  Conjugate  Gradient  method  that 
depends  only  on  A  rather  than  ATA  and  is  called  the  Generalized  Conjugate  Residual  (GCR) 
algorithm.  Saad  and  Schultz  developed  the  Generalized  Minimal  Residual  (GMRES)  algorithm 
which  is  mathematically  equivalent  to  GCR  but  is  less  prone  to  breakdown  for  certain  problems 
and  requires  less  storage  and  arithmetic  operations.  However  GCR  remains  the  easier  algorithm 
to  implement,  especially  in  parallel.  The  GCR  algorithm  is: 
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(3) 


To  calculate  the  faj  the  vector  Arj  and  the  previous  Apf  s  arc  required.  The  number  of  matrix 
vector  products  per  step  can  be  reduced  to  one  if  Apj+\  is  calculated  by 


i 

Apj+ 1  =  Arj+ 1  +  ^2  fojAPi  (4) 

i=0 

This  may  not  be  beneficial  if  A  is  sparse  and  j  is  large.  A  restarted  version  called  GCR(m)  is 
defined  so  that  when  the  iteration  reaches  step  m  all  the  pj ’s  and  Apj ’s  arc  thrown  away. 

For  the  Block  Incomplete  Lower  Upper  (BILU)  factorization  the  matrix  is  partitioned  into 
5x5  matrix  blocks  associated  with  each  cell  in  the  mesh.  The  use  of  this  blocking  reduces  the 
memory  required  to  store  the  matrix  in  a  sparse  matrix  format.  For  clarity  matrix  elements  refer 
to  these  blocks  from  now  on. 

Consider  a  general  sparse  matrix  A  whose  elements  arc  aij,  i,j  =  1, . . .  ,n.  A  general  in¬ 
complete  factorization  computes  a  sparse  lower  triangular  matrix  L  and  a  sparse  upper  triangular 
matrix  U  so  that  the  residual  matrix  R  =  LU  —  A  satisfies  certain  contraints,  such  as  having 
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entries  in  a  prescribed  pattern.  A  common  constraint  consists  of  taking  the  zero  pattern  of  the  L 
U  factors  to  be  precisely  the  zero  pattern  of  A.  However  the  accuracy  of  the  ILU(O)  incomplete 
factorization  may  be  insufficient  to  provide  an  adequate  rate  of  convergence. 

More  accurate  block  Incomplete  LU  factorisations  allowing  extra  terms  to  be  tilled  into  the 
factorisation  arc  often  more  efficient  as  well  as  more  robust.  Consider  updating  the  atj  element  in 
full  Gaussian  Elimination  (GE).  The  inner  loop  contains  the  equation 

O-ij  —  n  i]  Vjk  O-kj  ■  (5) 

If  levjj  is  the  current  level  of  element  a,.,  then  the  new  level  is  defined  to  be 

lev,.,  =  min(levjj,  levjfc  +  levfcj-  +  1).  (6) 

The  initial  level  of  fill-in  for  an  element  a,.;  of  a  sparse  matrix  A  is  0  if  atj  ^  0  and  oo 
otherwise.  Each  time  the  element  is  modified  in  the  GE  process  its  level  of  till-in  is  updated  by 
equation  6.  Observe  that  the  level  of  till-in  of  an  element  will  never  increase  during  the  elimina¬ 
tion.  Thus  if  ciij  /  0  in  the  original  matrix  A,  then  the  element  will  have  a  level  of  till-in  equal  to 
zero  throughout  the  elimination  process.  The  above  gives  a  systematic  algorithm  for  discarding 
elements.  Hence  ILU(k)  contains  all  of  the  till-in  elements  whose  level  of  fill-in  does  not  exceed 
k.  The  algorithm  is  given  by: 

For  all  non  zero  elements  define  lev  =  0 

For  i  =  2, . . . ,  n  Do: 

For  j  =  1,2....,  i  —  1  and  for  lev  aKI  <  k 

aij  =  aij/ajj 

ecu  —  O  jjU  ji  l  —  j T  1, . . . ,  n  (7) 

Update  the  levels  of  fill  in  for  non  zero  a, , 

EndDo 

If  lev  aij  >  k  then  a,;,-  =  0 

EndDo 

There  are  two  variations  used  for  the  preconditioning  in  the  current  work.  First  the  linear 
solver  described  above  is  coded  for  complex  variables.  A  complex  variable  formulation  has  3 
advantages  over  splitting  the  variable  into  real  and  imaginary  parts.  First  it  does  not  expand  the 
bandwidth  of  the  matrix.  Secondly  the  precoditioner  is  faster  to  calculate  and  requires  less  storage. 
Thirdly  the  preconditioner  was  found  to  be  more  robust  with  respect  to  dropping  off-diagonal 
terms. 

The  Block  ILU(k)  preconditioner  of  the  Jacobian  matrix  is  also  available  to  precondition  the 
system  with  the  Jacobian  matrix  of  the  second  order  spatial  scheme.  The  factorisation  of  the 
second  order  matrix  can  suffer  from  instability  and  often  makes  a  poorer  preconditioner  than 
the  factorisation  of  the  matrix  from  the  first  order  scheme.  The  first  order  matrix  also  has  the 
advantage  that  it  has  7  non-zero  blocks  per  cell  in  the  fluid  grid,  whereas  the  second  order  matrix 
has  13  non-zero  blocks. 


4  Linear  Solver  -  Parallel 

A  key  develop  for  making  the  IPM  a  practical  proposition  for  full  scale  problems  is  the  develop¬ 
ment  of  a  parallel  implementation  of  the  method.  Calculating  the  steady  state  and  forming  the 
Jacobian  matrix  are  standard  operations  in  parallel  and  will  not  be  discussed.  The  difficult  paid  is 
the  lineal-  solution  and  this  is  described  in  the  current  section. 
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Figure  1:  Effective  data  decomposition  of  the  Jacobian  and  right  hand  side  vector 

4.1  Matrix  Generation  and  Krylov  Method 

The  data  decomposition  is  done  by  storing  whole  blocks  of  the  multiblock  grid  on  a  single  pro¬ 
cessor.  This  means  that  the  partitioning  of  the  grid  can  be  consided  as  a  problem  of  partitioning 
the  blocks.  Each  processor  stores  a  certain  number  of  blocks  in  the  grid  and  their  associated  fluid 
variables.  The  structural  variables  arc  treated  in  a  different  way.  Each  processor  stores  all  the 
structural  information.  This  is  currently  not  too  expensive  as  the  number  of  structural  equations 
is  small  compared  to  the  number  of  fluid  equations,  but  might  have  to  be  revisited  for  different 
problems. 

The  residual  and  Jacobian  arc  decomposed  in  the  method  shown  in  figure  1.  Here  4  processors 
are  assumed  and  the  schematic  in  the  figure  indicates  the  system  components 

Aff  Afs 

Asf  -A-ss 

Processors  with  no  solid  boundary  conditions  will  have  zeros  in  the  Asj  paid  of  the  Jacobian. 

Since  the  Jacobian  matrix  is  calculated  in  4  distinct  parts  Aff,  Afs,  Asf  and  ,4,s,  the  matrix 
vector  product  is  done  in  4  parts  which  are  summed  together.  The  products  with  A  fs  and  arc 
trivial  as  the  structural  unknowns  arc  on  all  processors.  The  product  with  Asf  requires  a  “reduce 
all”  global  sum  at  the  end  of  the  multiply  to  get  the  information  to  all  the  processors. 

The  ordering  of  the  fluid  unknowns  within  each  block  is  important.  The  cells  arc  partitioned 
as  shown  in  figure  2,  with  3  different  sets  of  cells.  The  matrix-vector  product  of  the  internal  set 
can  be  updated  without  any  communication  and  hence  it  is  possible  to  complete  this  paid  of  the 
product  while  waiting  on  the  messages  to  arrive  from  other  processors.  The  border  set  arc  the 
cells  owned  by  this  processor  but  which  require  some  communication  with  other  processors  to  be 
fully  calculated.  The  external  set  is  not  updated  at  all  on  this  processor  but  is  used  to  update  the 
border  points. 

To  maximise  the  parallel  performance  of  the  code  the  blocks  must  be  carefully  assigned  to 
the  processors.  If  there  is  a  large  number  of  blocks  it  is  not  difficult  to  get  approximately  the 
same  workload  onto  each  processor  for  the  CFD  only  case.  However  in  the  IPM  the  workload  for 
non  solid  wall  fluid  points  is  greater  than  that  for  any  other  point  -  due  to  the  contributions  from 
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Figure  2:  The  renumbering  scheme  used  in  the  Jacobian  matrix  Aff 


Asf.  Hence  the  number  of  fluid  points  on  each  processor  and  the  number  of  solid  wall  points  on 
each  processor  should  be  balanced.  Lastly  the  number  of  processor  block  boundaries  needs  to  be 
kept  as  small  as  possible.  Not  only  does  this  keep  the  size  of  the  messages  small  but  much  more 
importantly  BILU(k)  can  be  calculated  with  no  extra  appoximations  on  block  boundaries  if  both 
blocks  arc  on  the  same  processor. 

4.2  Preconditioning 

Now.  the  crucial  issue  is  the  parallel  implementation  of  the  linear  solver.  The  calculation  of 
the  LU  factors  and  the  forward  and  backward  substitutions  on  these  factors  arc  both  sequential 
operations.  It  is  possible  to  ignore  terms  in  the  Jacobian  matrix  A  which  couple  the  system 
between  processors.  However,  this  can  have  a  very  bad  influence  on  the  convergence  of  the 
Krylov  iterations  for  the  systems  being  considered  in  the  IPM.  The  decoupling  of  blocks  through 
dropping  terms  localises  the  approximation  to  the  solution  of  the  linear  system. 

Calculating  LU  factors  only  has  a  limited  amount  of  parallelism.  In  general  a  partitioning 
algorithm  minimizes  the  number  of  interface  elements  by  reducing  the  area  of  the  boundaries. 
Then  each  processor  can  independently  factor  its  set  of  nodes.  This  is  a  completely  local  opera¬ 
tion.  The  interior  nodes  arc  then  eliminated  for  the  interface  rows  forming  a  reduced  matrix  A1 
corresponding  to  interface  nodes  only.  Finally  all  the  processors  factor  A1 .  This  final  operation 
is  where  all  the  difficulty  in  parallel  sparse  factorisations  resides.  The  factorisation  of  A1  is  done 
in  stages  in  a  nested  fashion  until  AT  is  factored.  The  full  factorisation  has  not  been  implemented 
in  the  current  work.  The  library  EUCLID  was  used  to  test  a  par  allel  implementation  of  ILU  and 
the  performance  of  the  algorithm  was  not  found  to  be  satisfactory  due  to  the  inherent  limitations 
of  the  approach. 

To  improve  the  coupling  between  processors  at  the  preconditioning  stage  a  polynomial  pre¬ 
conditioner  was  considered.  In  polynomial  preconditioning  the  matrix  M  is  defined  by 

M~l  =  s(H) 

where  s  is  a  low  degree  polynomial.  The  polynomial  s  is  derived  from  the  Neumann  series 
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expansion 

I  +  N  +  N2  +  N3  +  ...  + Ns  where  N  =  I  -  A. 

Then 

M~1A  =  (1-NS+1). 

There  is  no  limitation  to  implementing  this  preconditioner  in  parallel  because  it  consists  simply 
of  matrix-vector  products. 

The  option  finally  selected  for  the  preconditioner  used  a  combination  of  the  localised  BILU 
preconditioner  with  the  first  order  polynomial  preconditioner.  The  idea  is  that  the  polynomial 
preconditioner  brings  in  some  of  the  global  information  through  matrix-vector  products  that  is 
lost  by  the  localisation  of  the  BILU  factorisation. 


5  Test  Case  and  Time  Marching  Results 

There  is  a  considerable  amount  of  published  experimental  data  for  80°  sweep  delta  wings.  Hence 
this  sweep  angle  was  selected  for  the  current  study.  The  geometry  of  the  wing  was  identical  to  that 
used  by  Arena  and  Nelson  [5].  The  wing  has  a  flat  upper  and  lower  surface  with  a  45°  windward 
bevel  and  a  root  chord  of  0.4222m.  The  moment  of  inertia  for  this  wing  was  given  as  Ixx  = 
0.00125  Kg-m2.  The  experiment  was  performed  at  a  Reynolds  number  of  1.5  x  105.  Since  the 
flow  solver  here  is  compressible,  a  Mach  number  of  0.2  was  used  to  avoid  convergence  difficulties 
at  the  very  low  Mach  numbers  used  in  the  experiments.  The  freestream  air  density  is  used  to  non- 
dimensionalise  the  moment  of  inertia  of  the  wing  for  the  computations.  The  freestream  air  density 
in  the  experiment  was  unavailable  and  so  the  value  was  assumed  to  be  1.23 Kgm~3  (sea-level  ISA 
conditions). 

A  fine  grid  for  the  Euler  simulations  was  created  with  approximately  1.6  million  points.  From 
this  grid  two  levels  were  extracted  by  removing  every  second  grid  point  in  each  direction.  A 
RANS  grid  was  also  generated.  The  comparison  of  the  pressure  distributions  along  a  spanwise  cut 
with  measurements  from  Arena  and  Nelson  [5]  is  shown  in  figure  3.  The  RANS  predictions  agree 
well  with  the  measurements,  with  only  a  slight  under-prediction  of  the  secondary  separation.  The 
medium  grid  Euler  results  predict  the  suction  level  close  to  the  measured  value.  The  Euler  results 
do  not  predict  the  secondary  separation,  as  this  is  associated  with  boundary  layer  separation. 
Simulations  of  free-to-roll  motion  at  15°  and  20°  angle  of  attack  were  conducted  on  the  medium 
grid  using  the  Euler  equations.  The  roll  angle  histories  for  both  simulations  arc  shown  in  figure 
4.  At  15°  angle  of  attack  the  solution  is  dynamically  stable  (i.e.  the  amplitude  of  the  oscillations 
decreases  due  to  aerodynamic  damping).  However  at  20°  angle  of  attack,  the  initial  roll  angle  of 
10°  initiates  a  wing  rock  response  with  the  amplitude  of  the  motion  increasing  with  time. 


6  Prediction  of  Wing  Rock  Onset  Angle 

6.1  Coarse  Grid  Tests 

As  a  first  test  of  the  second  order  Jacobians  and  the  linear-  solver,  the  IPM  was  used  to  calculate  the 
wing  rock  onset  angle  on  the  coarse  grid.  3  Levels  of  fill-in  were  used  for  the  preconditioner  which 
allowed  the  residual  of  the  linear-  system  to  be  driven  down  1 1  orders  in  less  than  60  iterations. 

At  22  degrees,  after  7  inverse  power  iterations  the  critical  eigenvalue  had  converged  to  6 
significant  figures.  The  value  is  —1.381245  x  10~4  ±  (0.223497.  At  23  degrees  the  converged 
value  is  1.805895  x  10-4  ±  (0.232095.  These  angles  bracket  the  onset  angle,  which  can  be 
estimated  by  linear-  interpolation  as  22.4  degrees.  This  is  in  agreement  with  the  results  of  other 
methods  for  the  coarse  grid  [14]. 
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Figure  3:  Grid  refinement  study  -  Surface  pressure  distributions  at  60%cr.  Lines  indicate  the  coarse, 
medium  and  fine  Euler  results. 


Figure  4:  Wing  rock  roll  angle  histories  predicted  using  the  Euler  equations  on  the  medium  grid  - 
a  =  15°  and  20° 
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Figure  5:  Linear  solver  convergence  for  BILU(k)  and  Polynomial/BILU(2)  on  the  coarse  grid  and  1 
processors. 

Next,  the  use  of  the  first  order  Jacobian  to  calculate  the  preconditioner  for  the  linear  system 
which  uses  the  second  order  Jacobian  is  examined.  The  system  used  to  test  the  linear  solver  was 
generated  at  21  degrees  incidence.  There  arc  two  preconditioners  used:  the  BILU  factorisation 
with  1,  2  and  3  levels  of  fill-in,  and  the  Polynomial  preconditioner  combined  with  BILU(2).  The 
calculations  shown  in  figure  5  were  done  on  1  processor  and  so  the  BILU  factorisation  involves  no 
extra  approximation.  We  first  evaluate  the  results  in  terms  of  iterations  to  convergence  since  we 
arc  ultimately  concerned  with  how  the  performance  of  the  Krylov  method  scales  in  parallel.  The 
number  of  iterations  to  convergence  reduces  as  expected  as  the  level  of  fill-in  is  increased.  The 
number  of  iterations  in  each  case  is  significantly  larger  than  when  the  preconditioner  is  calculated 
for  the  second  order  Jacobian  matrix,  when  3  levels  of  fill-in  arc  required.  The  increase  in  the 
number  of  iterations  is  from  60  to  168  although  each  iteration  is  around  twice  as  expensive  for  the 
second  order  matrix  preconditioner.  Using  the  polynomial  preconditioner  on  top  of  BILU(2)  in 
this  case  does  nothing  to  change  the  number  of  iterations  to  convergence. 

The  cost  of  computing  the  eigenvalues  in  terms  of  multiples  of  steady  state  calculation  cost  is 
2.80  for  BILU(O),  2.34  for  BILU(l),  2.44  for  BILU(2)  and  3.70  for  the  Polynomial  preconditioner 
in  addition. 

Next  we  test  the  scaling  of  the  methods  in  parallel.  Here  the  BILU  factorisations  arc  blocked 
on  each  processor  and  it  is  the  influence  of  this  that  is  the  key  concern.  The  convergence  using 
BILU(2)  with  and  without  polynomial  preconditioning  on  one  and  four  processors  is  shown  in 
figure  6.  The  number  of  iterations  using  the  BILU  preconditioner  roughly  doubles  between  1  and 
4  processors.  Using  the  polynomial  preconditioner  actually  reduces  the  number  of  iterations  to 
convergence  when  moving  from  1  to  4  processors.  These  results  arc  encouraging  in  the  sense  that 
the  polynomial  precondioner  is  bringing  in  enough  global  information  about  the  solution  of  the 
system  to  allow  scaling  to  larger  numbers  of  processors. 
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Figure  6:  Linear  solver  convergence  for  BILU(2)  and  Polynomial/BILU(  1 )  on  the  medium  grid  and  1 
and  4  processors. 

6.2  Medium  Grid  Tests 

The  test  of  the  parallel  solver  is  whether  the  medium  grid  can  be  solved,  which  it  cannot  in 
serial  because  of  the  memory  requirements  of  the  method  (estimated  5.6Gb  to  store  the  Jacobian 
and  preconditioner).  To  evaluate  this  the  medium  grid  was  calculated  on  16  processors.  The 
convergence  of  the  linear  solver  is  shown  in  figure  7.  The  system  used  here  was  generated  at  15 
degrees  incidence. 

First  the  BILU(2)  results  arc  shown  for  reference  and  the  performance  of  the  linear  solver 
becomes  very  poor  due  to  the  localisation  of  the  precondioner  on  each  processor.  The  polynomial 
preconditioner  significantly  cuts  the  number  of  iterations  to  convergence.  Results  arc  shown  using 
different  levels  of  fill-in,  and  the  number  of  iterations  to  convergence  cluster  between  200  and  260 
iterations.  The  number  of  iterations  will  inevitably  increase  as  the  size  of  the  grid  increases,  even 
sequentially. 

The  performance  of  the  IPM  is  summarised  in  table  1.  First,  the  improvement  in  the  per¬ 
formance  of  the  lineal-  solver  from  using  the  polynomial  preconditioner  is  clear.  The  number  of 
lineal'  solver  steps  to  convergence  is  reduced  by  roughly  a  factor  of  3.  The  resulting  improvement 
in  the  time  to  convergence  of  the  IPM  is  roughly  25-40%.  Finally,  the  table  shows  the  bifurcation 
happens  between  16  and  17  degrees.  This  is  consistent  with  the  behaviour  of  the  time  domain 
solver  shown  in  figure  4. 


7  Reduced  Order  Model  Implementation 

7.1  Summary  of  Method 

The  wing  rock  onset  angle  is  only  one  part  of  the  information  needed  about  the  system.  In  addition 
the  damping  below  the  critical  angle  and  the  limit  cycle  amplitude  above  it  are  required.  From 
the  stability  calculation  we  have  the  critical  angle,  the  frequency  and  the  critical  eigenvector.  Two 


11 


Figure  7:  Linear  solver  convergence  for  BILU(2)  and  Polynomial/BILU(k)  on  the  medium  grid  and 
16  processors. 


a 

CPU 

CPU 

Linear 

Linear 

Eigenvalue 

Polynomial 

BILU(2) 

Polynomial 

BILU(2) 

15° 

2.91 

3.80 

229 

596 

±0.1727  -  7.53  x  10“4 

16° 

2.53 

4.08 

214 

641 

±0.183?:  -  2.25  x  10~4 

17° 

2.31 

4.00 

202 

704 

±0.1927  ±2.01  x  10-4 

18° 

2.69 

4.38 

265 

774 

±0.2027  ±6.32  x  10~4 

Table  1:  Summary  of  Timings  on  Medium  Grid  on  16  processors 


methods  were  presented  in  the  previous  report  to  exploit  this  information  to  develop  a  small  order 
model  for  the  prediction  of  this  extra  information. 

The  implementation  of  the  first  method  is  described  below.  This  method  has  the  following 
features: 

•  change  of  variables  by  projecting  the  full  order  solution  onto  the  critical  eigenvector 

•  Taylor  expansion  of  the  full  order  residual,  up  to  and  including  third  order  terms 

•  Use  of  centre  manifold  theory  to  include  the  influence  of  the  non-critical  part  of  the  solution 
on  the  critical  paid. 

The  current  report  describes  the  calculation  of  the  difficult  terms  arising  from  the  higher  order 
derivatives,  and  then  presents  the  initial  results  for  the  roll  response. 


7.2  Calculation  of  High  Order  Derivatives 

As  described  in  the  previous  report,  the  method  of  model  reduction  is 
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1.  calculate  ao,  p,  q  and  uj  from  the  direct  solver.  Here  q  is  the  critical  right  eigenvector  of  the 
Jacobian  matrix,  p  is  the  critical  right  eigenvector  of  the  transpose  of  the  Jacobian  matrix, 
ao  is  the  wing  rock  onset  angle  and  ±iui  is  the  critical  eigenvalue. 

2.  calculate  B(q,  q)  and  /J(q.  q).  Here  B  is  the  second  Jacobian  operator 

m  x  1  d2R 

Wl,W2)  =  2^2WlW2 


3.  solve 

f  (2iul  -  A)h20  =  B( q,  q) 
1  Ahn  =  B(q,q) 


4.  form  C'(q,  q,  q).  Here 

C  (w[ ,  w2,  w3) 

is  the  third  Jacobian  operator. 

5.  form  g20,  gn,  g2\  where 


1  d3R 

-__-WlW2W3 

o  am1 


(8) 


520  =  (p,  B{ q,  q)}  gu  =  (p,  B(q,  q)) 


and 


521  =  (p,C(q,q,q)) 

-  2(p,  B( q,  A~1B(q,  q)))  +  (p,  B( q,  (2 iul  -  A)~1B( q,  q))) 

+  —  (P,B{q,q)){p,B{q,q)) 

IU) 

-  —  |(p,H(q,q))|2  - -^-|(p,H(q,q))|2 

VjJ  SVjJ 

6.  calculate  the  terms  from  the  parameter  expansion 


9R 

«oo  —  (q,  -q^) 


«oi  =  (q,  Aa  p) 
aw  =  (q,  Aap) 


Note  that  the  terms  arising  from 

a2  =  (q,  Aa  y) 

lead  to  quadratic  terms  in  z  and  z  after  exploiting  of  the  centre  manifold  expression.  These 
terms  were  found  to  be  small  and  so  were  approximated  as  zero  in  the  final  calculations. 

7.  time  march 

z  =  iwz+  i 5202: 2  +  5n^  +  ^502  +  ^521  ^  +  d(«oo  +  aoi^  +  «io  z) 


For  the  implementation  a  number  of  issues  were  tackled.  First,  the  parameter  terms,  including 
the  term  Aa  was  calculated.  This  was  done  using  finite  differences,  requiring  only  2  residual 
evaluations. 

Secondly,  the  second  and  third  order  Jacobian  products  were  calculated  using  matrix  free 
products  against  q,  p  and  their  complex  conjugates.  This  was  tackled  in  previous  work  on  aeroe- 
lastic  instabilities.  The  first  step  is  to  rewrite  the  required  derivatives  in  real  and  imaginary  parts 
as  follows  Denoting 

Q  =  Qi+iQ2,  Q  €  Cn  Qi,Q2  €  Rn. 
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then  the  following  identities  can  be  derived 

B( Q,Q)  =  R(Qi,  Qi)  -  B(Q2,  Q2)  +  2iB(Q1,  Q2) 

B{  Q,Q)_  =  -B(Qi,Qi)  +  -B(Q2,Q2) 

C( Q,Q,Q)  =  (^(Qi,  Qi,  Qi)  +  C(Qi,  Q2,  Q2)  +  *C*(Qi,  Qi,  Qo)  +  iC'(Q2,  Q2,  Q2)- 

Further,  the  following  identities  can  be  derived: 

B(y  +  w,  v  +  w)  =  B(v,  v)  +  2B(v,  w)  +  B(w,  w) 

B(v  —  w,v  —  w)  =  B(v,  v)  —  2B(v,  w)  +  B{w,  w) 

so  that  B(v,  w )  can  be  expressed  as 

B(v,  w)  =  -  [B(v  +  w,v  +  w)  —  B(v  —  w,  v  —  w)]  . 

A  similar  set  of  identities  hold  for  C 

C(v  +  w,  v  +  w,  v  +  w)  =  C(v,  v,  v)  +  3C(v,  v,  w)  +  3C(v,  w,  w)  +  C(w,  w,  w) 

C(y  —  w,v  —  w,v  —  w)  =  C(v,  v,  v)  —  3C(v,  v,  w)  +  2>C{v,  w,  w)  +  C(w,  w,  w) 

and  hence  C(v,v,w)  can  be  expressed  as 

C(v,v,w )  =  -  [C{y  +  w,  v  +  w,  v  +  w)  —  C(v  —  w,  v  —  w,  v  —  w)  —  2 C(w,  w,  m)l  . 

6 

By  the  use  of  directional  derivatives  it  is  then  possible  to  evaluate  the  bilinear  and  trilinear 
functions  B(x,  y)  and  C(x,  y.  z)  on  any  set  of  coinciding  real  vectors.  These  derivatives  can  be 
approximated  using  finite  differences, 

1  , 

B{n,  v)  =  [R(w0  +  hv,  fi0)  -  R(w0  -  hv,  y0)}  +  0(h6 )  (9) 

and 

C(v,  v,  v)  =  [— R3  +  8R2  —  13Ri  +  13R_i  —  8R—2  +  R— 3]  +  0(/r4)  (10) 

where  h  is  small  and  R/  =  R(wo  +  Ihv,  y, 0). 

The  evaluation  of  these  finite  differences  suffers  from  truncation  error  for  values  of  h  which  arc 
too  large,  and  from  rounding  error  for  values  which  arc  too  small.  To  overcome  the  second  limit, 
quad-double  arithmetic  was  used  for  the  residual  evaluation.  A  high  precision  version  of  algebraic 
and  transcendental  functions  is  also  be  required,  in  this  case  because  of  the  contributions  of  such 
functions  in  Osher’s  flux  function.  The  QD  library  [23]  was  used  to  obtain  this  functionality.  This 
library  allows  extension  of  existing  code  to  double-double  precision  and  quad-double  precision 
without  major  recoding,  by  using  operator  overloading. 

To  test  the  accuracy  of  the  reduced  model  terms,  the  convergence  of  the  third  Jacobian  term, 
for  the  second  order  spatial  discretisation,  with  h  is  considered.  The  third  Jacobian  term  is  more 
sensitive  to  rounding  error  than  the  second  term,  and  also  the  parameter  terms.  The  convergence 
is  shown  in  table  2  and  there  is  a  good  range  of  values  for  h  where  satifactory  values  are  obtained. 

7.3  Limit  Cycle  Predictions 

The  model  reduction  was  exercised  on  the  coarse  grid  using  the  first  order  spatial  scheme.  The 
onset  angle  was  previously  computed  as  24.3  degrees.  The  model  coefficients  were  computed  and 
the  two  degree  of  freedom  model  was  run  for  different  values  of  a  =  a  —  24.3.  The  comparison 
with  the  full  order  model  is  shown  in  figure  8. 
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31  degrees 

Figure  8:  Comparison  of  the  response  of  the  full  order  and  reduced  order  models  on  the  coarse  grid 
using  a  first  order  spatial  discretisation. 
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h  Re  (q,  C{ p,  p,  p)) 

4  5. 53477001 26789234e-01 

6  3.55058628371 9 1756e+02 
8  3. 1526220567 128992e+04 
10  1.914785 1302829609e-04 

12  1.914785 1302829441e-04 

14  1.9 14785 13028 17 110e-04 

16  1.9 1478589 1 28 14422e-04 


Im  (q,C(p,p,p)) 
6.2858220561002369e+00 
2.3350599906148987e+03 
1.6834102453444750e+05 
1.1694372975048235e-02 
1.1694372975048255e-02 
1. 169437297504805 le-02 
1.1694373075537749e-02 


Table  2:  Convergence  of  reduced  order  model  third  Jacobian  contribution  with  finite  difference  step 
size.  These  results  are  on  the  coarse  grid  using  the  second  order  spatial  discretisation. 

Finally,  the  reduced  model  predictions  for  the  second  order  spatial  scheme  on  the  coarse  grid 
were  calculated.  The  growth  of  the  limit  cycle  oscillation  compared  with  time  domain  results  is 
shown  in  figure  7.3.  For  the  full  order  predictions  the  calculations  took  1-2  days  to  settle  to  a  limit 
cycle  oscillation.  The  reduced  model  predictions  were  obtained  almost  instantaneously  for  any 
angle  after  the  reduced  model  ccoefficient  were  derived,  an  operation  that  takes  no  more  than  30 
minutes  using  a  laptop  for  this  grid.  The  comparison  is  excellent  within  a  few  degrees  of  the  wing 
rock  onset  angle  and  gradually  becomes  worse  as  the  angle  of  attack  becomes  larger.  It  is  expected 
that  the  Taylor  series  and  centre  manifold  approximations  will  become  worse  as  we  move  away 
from  the  bifurcation  point.  However,  the  most  useful  information  is  close  to  the  onset  angle,  and 
the  reduced  model  tells  use  the  expected  amplitude  immediately  after  the  onset  angle  is  passed. 


8  Conclusions 

Two  achievements  have  been  reported.  First,  a  parallel  version  of  an  eigenvalue  based  stability 
prediction  method  has  been  demonstrated.  The  key  development  was  to  use  a  polynomial  based 
preconditioner  on  top  of  a  localised  BILU  factorisation  to  reduce  the  effect  of  the  localisation  on 
the  convergence  of  the  linear  solver.  Calculations  were  successfully  carried  out  on  the  medium 
delta  wing  grid  on  16  processors.  The  performance  of  these  calculations  showed  that  the  eigen¬ 
value  at  each  angle  of  incidence  could  be  computed  in  a  computational  cost  between  2  and  3  times 
the  cost  of  a  steady  state  CFD  calculation.  It  seems  likely  that  better  forms  of  the  polynomial  pre¬ 
conditioner  could  be  developed  that  take  advantage  of  the  fact  that  we  know  the  eigenvalue  that 
the  IPM  is  converging  to  is  also  an  eigenvalue  of  the  coefficient  matrix  if  the  linear  system.  It 
seems  likely  that  this  slows  down  the  convergence  of  the  linear  solver. 

Secondly,  the  application  of  the  projected  reduced  order  model  to  wing  rock  limit  cycle  pre¬ 
diction  was  made.  Key  difficulties  for  the  wing  rock  problem  was  the  need  to  use  the  angle  of 
attack  as  the  bifurcation  parameter.  This  means  that  the  parameters  terms  in  the  reduced  model 
needed  to  include  derivatives  of  the  CFD  residual  function  and  its  Jacobian  with  the  respect  to 
the  angle  of  attack.  The  predictions  from  the  two  degree  of  freedom  ROM  of  the  limit  cycle  os¬ 
cillation  amplitudes  were  in  excellent  agreement  with  calculations  using  the  full  order  system,  at 
dramatically  reduced  computational  cost. 

Future  work  would  involve  applying  these  techniques  to  full  sized  aircraft  problems,  and  to 
integrating  the  methods  into  a  framework  for  parametric  searches  for  stability  (both  aeroelastic 
and  flight  mechanics),  including  the  assessment  of  uncertainty. 
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Figure  9:  Comparison  of  the  predicted  Wing  Rock  amplitudes  of  the  full  order  and  reduced  order 
models  on  the  coarse  grid  using  a  first  order  spatial  discretisation. 
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